Performance Analysis of FEM Algorithms on GPU and Many-Core Architectures
نویسندگان
چکیده
The roadmaps of the leading supercomputer manufacturers are based on hybrid systems, which consist of a mix of conventional processors and accelerators. This trend is mainly due to the fact that the power consumption cost of the future cpu-only Exascale systems will be unsustainable, thus accelerators such as graphic processing units (GPUs) and many-integrated-core (MIC) will likely be the integral part of the TOP500 (http://www.top500.org/) supercomputers, beyond 2020. The emerging supercomputer architecture will bring new challenges for the code developers. Continuum mechanics codes will particularly be affected, because the traditional synchronous implicit solvers will probably not scale on hybrid Exascale machines. In the previous study [1], we reported on the performance of a conjugate gradient based mesh motion algorithm [2] on Sandy Bridge, Xeon Phi, and K20c. In the present study we report on the comparative study of finite element codes, using PETSC and AmgX solvers on CPU and GPUs, respectively [3,4]. We believe this study will be a good starting point for FEM code developers, who are contemplating a CPU to accelerator transition.
منابع مشابه
Scalability of parallel finite element algorithms on multi-core platforms
The speedup of element-by-element FEM algorithms depends not only on peak processor performance but also on access time to shared mesh data. Eliminating memory boundness would significantly speed up unstructured mesh computations on hybrid multi-core architectures, where the gap between processor and memory performance continues to grow. The speedup can be achieved by ordering unknowns so that ...
متن کاملArchitecting the finite element method pipeline for the GPU
The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core stream...
متن کاملPerformance Analysis and Optimization of the OP2 Framework on Many-Core Architectures
This paper presents a benchmarking, performance analysis and optimization study of the OP2 ‘active’ library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targetin...
متن کاملPerformance Analysis and Optimisation of the OP2 Framework on Many-core Architectures
This paper presents a benchmarking, performance analysis and optimisation study of the OP2 “active” library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targetin...
متن کاملHigh Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation
Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014